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[57] ABSTRACT 

A method and apparatus for providing mirrored site admin- 
istrators with the number of hits from a proxy's document 
cache and for dispatching document requests in a proxy to 
more efficiently allocate the document cache space within 
the proxy are provided. A proxy includes a document cache 
storing recently requested documents. The proxy is coupled 
to a client and to a remote server. The proxy maintains 
information regarding requests from the client that are 
serviced from the proxy's document cache such as the 
Uniform Resource Locator (URL) of the requested docu- 
ment and the number of cached responses. This information 
is provided by the proxy to a remote site administrator. In 
this manner, remote site administrators can more accurately 
track total hits. 

16 Claims, 10 Drawing Sheets 
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METHOD AND APPARATUS FOR 
PROVIDING REMOTE SITE 
ADMINISTRATORS WITH USER HITS ON 
MIRRORED WEB SITES 

CROSS-REFERENCES TO RELATED 
APPLICATIONS 

The present application is a continuation-in-part of 
co-pending U.S. patent application entitled, "Method and 
Apparatus for Providing Proxying and Transcoding of 
Documents in a Distributed Network," having application 
Ser. No. 08/656,924, and filed on Jun. 3, 1996. 

FIELD OF THE INVENTION 

The invention relates generally to the field of client-server 
computer networking. More specifically, the invention 
relates to mirroring of Web sites, notifying mirrored site 
administrators of hits, and allocation of the Web's content 
among mirroring servers based upon the Uniform Resource 
Locator (URL). 

BACKGROUND OF THE INVENTION 

World Wide Web (Web) documents are commonly written 
in HTML (Hypertext Mark-up Language). HTML docu- 
ments typically reside on Web servers and are requested by 
Web clients. Often delays can be introduced during Web 
browsing by heavy communications traffic on the Internet or 
slow response of a remote site, for example. Providing one 
or more servers for mirroring Web sites located on remote 
servers is one means of reducing delays involved with 
browsing the Web. These mirroring servers, typically 
referred to collectively as a "proxy" or individually as 
"proxy servers," store frequently accessed Web sites in a 
local cache, thereby eliminating recurrent retrievals of com- 
monly accessed documents. Thus, when a request for a 
particular Web page is received from a client, the proxy 
server associated with the particular client looks first to its 
local cache to service the request rather than the remote site 
upon which the Web page resides. If the requested document 
is found locally, the request can be serviced by the proxy 
server and a subsequent request to the remote server for the 
document can be avoided. Therefore, only when a valid copy 
of the requested document is not in the proxy's local cache 
would the remote server need to be accessed. In this manner, 
exposure to heavy communications traffic on the Internet 
and slow response of remote serves can be reduced. 

While this mirroring approach is beneficial to end-users, 
it makes hit tracking for remote site administrators difficult. 
A hit is a request for a Web page, typically initiated by a user 
selecting a hypertext link for the Web page. The mirroring 
approach discussed above disrupts a remote server's ability 
to track the total number of requests for a given Web page 
because, as discussed above, some of the requests are 
intercepted and serviced by proxy servers. It is desirable to 
have an accurate count of requests for a given Web page or 
group of pages to track the relative popularity of a page, for 
example or to provide feedback to advertisers whose adver- 
tisements appear on the page. Therefore, what is needed is 
a mechanism for tracking user hits by the proxy and a 
mechanism for notifying mirrored sites, thereby allowing 
remote site administrators to accurately track total hits (i.e., 
those requests serviced from a proxy's local cache and the 
requests serviced by the remote server). 

Another problem with the current mirroring approach is 
the inefficient allocation of the proxy's cache space. 
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Currently, each client is assigned to one or more proxy 
servers. Therefore, the documents most recently requested 
by each active client will reside in the corresponding proxy 
server's cache. Assuming one or more clients assigned to 

s different proxy servers have requested the same document 
recently, the same document might be cached in several of 
the proxy servers, thereby reducing the cache storage space 
for other frequently requested documents. Further, one or 
more extremely popular documents might potentially be 

10 cached in each proxy server. While redundancy of informa- 
tion is useful for fault tolerance, organized redundancy 
would be preferable. Given the foregoing, what is needed is 
a means of more efficiently allocating cache space within a 
proxy. Specifically, it would be desirable to allocate mutu- 

15 ally exclusive portions of the Web's content to particular 
proxy servers. 

SUMMARY OF THE INVENTION 

A method and apparatus are described for providing 

20 mirrored site administrators with the number of hits from a 
proxy's document cache and for dispatching document 
requests in a proxy to more efficiently allocate the document 
cache space within the proxy. A proxy includes a document 
cache storing recently requested documents. The proxy is 

25 coupled to a client and to a remote server. The proxy 
maintains information regarding requests from the client that 
are serviced from the proxy's document cache. This infor- 
mation is provided by the proxy to a remote site adminis- 
trator. In this manner, remote site administrators can more 

30 accurately track total hits (i.e., those requests serviced from 
a proxy's document cache plus the requests serviced by the 
remote server itself). 

According to another aspect of the present invention, a 
proxy implements a dispatching scheme for client requests 

35 that results in a more efficient allocation of the proxy's 
document cache space. The proxy receives a document 
request from a client. A Uniform Resource Locator (URL) is 
included in the document request. The proxy forwards the 
request to one of a plurality of proxy servers based upon the 

40 URL. 

According to another aspect of the present invention, the 
proxy performs a hash function on the URL that maps the 
URL to exactly one of the plurality of proxy servers. 
Advantageously, in this manner, mutually exclusive portions 
45 of the Web's content can be allocated to particular proxy 
servers. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example, 
50 and not by way of limitation, in the figures of the accom- 
panying drawings and in which like reference numerals refer 
to similar elements and in which: 

FIG. 1 is a block diagram illustrating several clients 
connected to a proxy server in a network. 

FIG. 2 is a diagram illustrating a client according to one 
embodiment of the present invention. 

FIG. 3 is a block diagram of a server according to one 
embodiment of the present invention. 
60 FIG. 4 is a data flow diagram illustrating the interaction 
of proxy components according to one embodiment of the 
present invention. 

FIG. 5 A is a depiction of an exemplary site tracking list 
according to one embodiment of the present invention. 
65 FIG, 5B is a depiction of an exemplary per site hit 
database according to one embodiment of the present inven- 
tion. 
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FIG. 6 is a logical view of an exemplary directory art, to a number of remote servers 4 via a conventional 

structure of a remote server. network infrastructure 3, such as the Internet. The WebTV™ 

FIG. 7 is a flow diagram illustrating a method of per- svstem also includes a WebTV™ server 5, which specifi- 

forming hit accumulation according to one embodiment of cally supports the WebTV™ clients 1. Hie WebTV™ clients 

the present invention. s 1 eac ^ have a connection to the WebTV™ server 5 either 

FIG. 8 is a flow diagram illustrating a method of hit °r through the modem pool 2 and the Internet 3. 

reporting according to one embodiment of the present inven- No * thal u the ? °? ? * c ° Dve T ™ dem P? 1 ' 

t - on such as those found today throughout the world providing 

' « . . n .„ , . . access to the Internet and private networks. 

FIG. 9 is a data now diagram illustrating the interaction _ fc , „ 

of proxy components according to another embodiment of 10 *ota ^miteJ^Uou, 10 order to facilitate expla- 

the present invention. nallon the WcbTV ™ 5 15 generally discussed as if it 

JL ^« . „ .„ . , , , were a single device, and functions provided by the 

FIG 10 is a flow diagram illustrating a method of WebWTM senrjces are „ discussed B be - 

dispatchmg requests to segregate the storage of documents fomed b such ^ device Hqw ^ WebTyra 

according to one embodrment of the present invention. , s ^ 5 may comprise physica , ^ logica , 

DETAILED DESCRIPTION devices connected in a distributed architecture, and the 

various functions discussed below which are provided by the 

A method and apparatus are described for providing WebTV™ services may actually be distributed among mul- 

mirrored site administrators with the number of hits from a tiple WebTV™ server devices, 

proxy's document cache and for maintaining a more efficient 20 

document caching scheme in a client-server computer net- An Exemplary Client System 

work. In the following description, for purposes of FIG. 2 illustrates a WebTV™ client 1. The WebTV™ 

explanation, numerous specific details are set forth in order clieDt x mc i U{ ies an electronics unit 10 (hereinafter referred 

to provide a thorough understanding of the present inven- t0 ^ ^ WebTV™ box 10"), an ordinary television set 12, 

tion. It will be evident, however, to one skilled in the art that 25 and a remote u j Q an altemative embodiment of the 

the present invention may be practiced without these specific present mven tion, fa WebTV™ box 10 is built into the 

details. Further, in other instances, well-known structures television set 12 as an integral unit. The WebTV™ box 10 

and devices are shown in block diagram. includes hardware and software for providing the user with 

The present invention includes various steps, which will a graphical user interface, by which the user can access the 

be described below. The steps can be embodied in machine- 30 WebTV™ network services, browse the Web, send e-mail, 

executable instructions, which can be used to cause a and otherwise access the Internet. 

general-purpose or special-purpose processor programmed ^ WebTV™ client 1 uses the television set 12 as a 

with the instructions to perform the steps. Alternatively, the dcvicc The WebTV™ box 10 is coupled to the 

steps of the present invention might be performed by spe- television set 12 by a video link 6. The video link 6 is an RF 

cific hardware components that contain hardwired logic for (radio f rcqU ency), S-video, composite video, or other 

performing the steps, or by any combination of programmed equivalent form of video link. In the preferred embodiment, 

computer components and custom hardware components. the clien| 2 mcludes bolh a standard modem and an ISDN 

While embodiments of the present invention will be modem, such that the communication link 29 between the 

described with respect to HTML documents, the method and AQ WebTV™ box 10 and the server 5 can be either a telephone 

apparatus described herein are equally applicable to other (POTS) connection 29a or an ISDN connection 296. The 

types of documents such as text files, images (e.g., JPEG and WebTV™ box 10 receives power through a power line 7. 

GIF), audio files (e.g., .WAV, .AU, and .AIFF), video files Remote control n fa operaled by the ^ in order t0 

(e.g., .MOV, and AVI), and other document types com- CODtrol thc We bTV™ client 1 in browsing the Web, sending 

monly found on the Web. 45 e . mail> ^ performing other Internet-related functions. The 

S stem Overview WebTV™ box 10 receives commands from remote control 

y 11 via an infrared (IR) communication link. In alternative 

The present invention may be included in a system, embodiments, the link between the remote control 11 and the 

known as WebTV™, for providing a user with access to the WebTV™ box 10 may be RF or any equivalent mode of 

Internet. A user of a WebTV™ client generally accesses a 50 transmission. 
WebTV™ server via a direct-dial telephone (POTS, for 

"plain old telephone service"), ISDN (Integrated Services Exemplary Server System 

Digital Network), or other similar connection, in order to The WebTV™ server 5 generally includes one or more 

browse the Web, send and receive electronic mail (e-mail), computer systems generally having the architecture illus- 

and use various other WebTV™ network services. The 55 trated in FIG. 3. It should be noted that the illustrated 

WebTV™ network services are provided by WebTV™ architecture is only exemplary; the present invention is not 

servers using software residing within the WebTV™ servers constrained to this particular architecture. The illustrated 

in conjunction with software residing within a WebTV™ architecture includes a central processing unit (CPU) 50, 

client, random access memory (RAM) 51, read-only memory 

FIG. 1 illustrates a basic configuration of the WebTV™ 60 (ROM) 52, a mass storage device 53, a modem 54, a network 

network according to one embodiment. A number of interface card (NIC) 55, and various other input/output (I/O) 

WebTV™ clients 1 are coupled to a modem pool 2 via devices 56. Mass storage device 53 includes a magnetic, 

direct-dial, bi-directional data connections 29, which may be optical, or other equivalent storage medium. I/O devices 56 

telephone (POTS, i.e., "plain old telephone service"), ISDN may include any or all of devices such as a display monitor, 

(Integrated Services Digital Network), or any other similar 65 keyboard, cursor control device, etc. Modem 54 is used to 

type of connection. The modem pool 2 is coupled typically communicate data to and from remote servers 4 via the 

through a router, such as that conventionally known in the Internet. 
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As noted above, the WebTV™ server 5 may actually Document transcoder 466 is used to automatically revise 

comprise multiple physical and logical devices connected in the code of Web documents retrieved from the remote 

a distributed architecture. Accordingly, NIC 55 is used to servers 4, for purposes such as: (1) correcting bugs in 

provide data communication with other devices that are part documents; (2) correcting undesirable effects which occur 

of the WebTV™ services. Modem 54 may also be used to 5 when a document is displayed by the client 1; (3) improving 

communicate with other devices that are part of the the efficiency of transmission of documents from the server 

WebTV™ services and which are not located in close 5 to the client 1; (4) matching hardware decompression 

geographic proximity to the illustrated device. technology within the client 1; (5) resizing images to fit on 

An Exemplary Proxy te ^ ev ^^ on set ^» (*9 converting documents ioto other 

. „. , . . 10 formats to provide compatibility; (7) reducing latency expe- 

HG. 4 illustrates the caching and hit accumulation fea- rienced by a client x when displaying a Web page ^ 

tures of the WebTV™ proxy 400 according to one embodi- m . line (images Splayed in text); and, (8) altering 

ment of the present invention. In this embodiment, one or documents to fit into smaller memory spaces, 

more WebTV™ servers 5 may act as a proxy 400 in . , . ... . . 

-a- wj uti7tm r « i -4L * *l w> l j In one embodiment, hit accumulator server 415 may act 

providing the WebTV™ client 1 with access to the Web and 1C « f . TT _ , ™ n : . 

*u T,ruTi/TM • w -a 11 ,» LTtnw 15 as a Web server providing a Hypertext Transport Protocol 

other WebTV™ services. More specifically, WebTV™ nn-m\ • * _c u l- u * •* a • • < . 

- c „ ,. t It! * 1 (HTTP) interface by which remote site administrators can 

server 5 functions as a "caching proxy. In this example, v « . . . r . . e „, . 

Ann . , , & f ft _ ' , ... , ' access the accumulated hits for their sites by way of a Web 

proxy 400 includes a proxy server 405 and a hit accumulator , ~ ... «. A ^ e 3 . J . , ,. 

iie or * / *u * * ac *u browser. The hit accumulator server 415 may include a hit 

server 415. Client requests that are serviced from the proxy , .- A ... « A ... .. . 

, , ^. , , j j .1 log 420, a hit accumulator processor 430, a site tracking list 

server s local document cache 465 are communicated to the on .v . r ,_ A , .. , , ,.7. 

hit accumulator server 4U. As will be described below, the J ° ™'* blt processor 450 apd a per S1 te h.tdatabase 

hit accumulator server 415 maintains and organizes the data m - °° e » e *° d °£^™ ca <">g ^ to* » P 10 ^ 

. r ... . .. server to the hit accumulator server 415 is through a com- 

so as to provide hit tracking information to remote site . , . . ... . ,- n ™ . ^ , tL 

• ' ' t i u , j ■ ■ t i mon storage device such as hit log 420. This and other 

administrators such as remote site administrator 480. , % . ,. .7, , * , , , 

n . . . . . . AOn . . . .... , methods of communicating hits will be described below. 

Remote site administrator 480 may include entities such as n - , ... " . . . . .. ... 

« » j ., .. iji c iL * 25 Regardless of how hits are communicated to the hit accu- 

persons authorized to gather statistical data for the remote „ „«. „„ t u„ u,*, „„ i . 

. . j j . . i . mulator server 415, a process such as the nit accumulator 

site, persons authorized to manage and maintain the remote A ~ n . , . f, . - t ... . . . f 

. ' * . . ... ,r * . processor 430 is desirable to verify the hits against a list of 

site the site, the remote site itself, or an automated computer ; . , , ; r c . .. . fl 

• , . c , . • j • | i x locations that are to be monitored. Such a list of locations 

system or other device configured to receive statistical data , . , . . . . . - , A 

/ , . & may be stored in the site tracking list 425, for example. A 

tor the remote site. 30 in this contexl) refers l0 the location of a document. 

In this embodiment, the proxy server 405, includes a ^ bcalion be re resented by a URL) a directory path| 

proxy request processor 410, a document cache 465, a Qr Qther mechanisms for uni d identifying a particular 

document database 461, and a transcoder 466. The jjroxy documcnti Hils that arc va]idatcd b & c hit accumulator 

request processor 410 receives requests from me WebT^« ^ m are recorded h the she m database m 

chent 1 and sends responses to the WebTV™ client 1. The 35 ^ ^ gitc hU m ^ haye a CQUnt 

proxy request processor 410 maintains the document data- 0 f the hits for each location listed in the site tracking list 425. 

base 461, the document cache 465 and further determines In ^ embodim6nt , me hit repor t p roces5 or 450 may receive 

when transcoding will be performed. The document cache te from remote ^ administrators such M remote site 

465 is used for temporary storw administrator 480 for hit reports. The hit reports can be 

images text files audio files video files and other info 40 extfacted from ^ ^ ^ database 44Q and transmitted 

tion which is used frequently by either WebTV™ client 1 or tQ me ^ {n an RTML for le 



the proxy server 405. 



- f . ... While in this embodiment the proxy server 405 and the hit 

When a document request is received from a client, the , - 1r , , , 

\ in , . i,i . . accumulation server 415 have been shown as separate 

proxy request processor 410 determines whether to service - v . 1JU ,.f 

fi. . r .1. j . l a** l _f servers, the funcUonahty of both could be combined into one 

the request from the document cache 465 by performing a 45 ,, 7 * * jj-.- u ^ Ann ■ Ll l 

Z 1 1 , t a£.c ip *u a . r j WebTV™ server 5. Additionally, the proxy 400 might be 

search of the document cache 465. If the document is found , . . . , , J r 1 . Ac „„ 

. , , f .1 j expanded to include more than one proxy server 405. When 

locally, then the document may be retrieved form the docu- r ,. iL , nn , . , / J lL 

. . , . f , . v . .... expanding the proxy 400 to include more than one proxy 

ment cache 465 and transferred to the client with the r A Jr. , ' ... ... A .. r , J 

„ . Cl , ... t r j server 405, only one hit accumulation server 415 need be 

response. However, if the requested document is not found, 1 h 

then the proxy request processor 410 requests the document 50 

from the appropriate site and upon receipt the proxy request In alternative embodiments, hits may be communicated 

processor 410 provides the document to the client with the b y a P roxv t*™ 1 405 to me accumulation server 415 by way 

response. Further, the proxy request processor 410 antici- of a network connection such as permanent connection 

pates subsequent requests by storing the document in the through which events may be sent. Also, message passing 

document cache 465 55 mav be em P lo y ed whereby the proxy server 405 sends a 

When a document is retrieved by the proxy server 405 mes ffi e s " cb 55 a data S ram J? tbe hit accumulator 415 to 

from a remote server 4, for example, detailed information on ™f* U of a d r ocument hl !' 1 15 appreciated that many 

this document may be stored in the document database 461. olhcr mcans of communicatmg information between servers 

The stored information may subsequently be used by the are P 0SSlWe - 

proxy server 405 to speed up processing and downloading of W An Exemplary Site Tracking List 

that document in response to future requests for that docu- & 
ment. In addition, the transcoding functions and various FIG. 5 A illustrates an exemplary site tracking list accord- 
other functions of the WebTV™ service may be facilitated ing to one embodiment of the present invention. This 
by making use of information stored in the document illustration depicts a site tracking list 435 including site 
database 461. For example, the document database 461 may 65 tracking list records 505 for three remote sites: (1) http:/ 
include certain historical and diagnostic information for www.companyA.com/; (2) http://www.companyB.conV; 
Web pages that have been accessed by a WebTV™ client 1. and (3) http://www.companyC.com/. In this embodiment, 



03/08/2004, EAST Version: 1.4.1 



5,935,207 

7 8 

each site tracking list record 505 may include a list of one FIG. 6 is a logical view of an exemplary directory 

or more URL patterns 510. structure 600 that may exist on a remote server 4. This 

The list of URL patterns 510 may be a list of strings exemplary directory structure 600 illustrates the need for a 

identifying the initial portions (e.g., prefixes) of URLs to be flexible method of tracking the number of hits. Web pages 

tracked. In this example, the proxy 400 tracks hits for 5 might reside in any or all of the directories shown. In this 

documents identified by URLs with a prefix that matches example, the URL patterns within a site tracking list record 

any of the URL patterns 510 specified in one of the site 505 may identify a particular directory or directories in the 

tracking list records 505. The hits may then be logged to a hierarchy depicted. 

record in the per site hit database 440 corresponding to the The remote site administrator for CompanyAmay want to 
site tracking list record 505 which contained the matching 10 know the number of hits in an Ads subdirectory 605 and an 
URLpattern.ThisformofURLpatternisusefulfor tracking Events subdirectory 610. This may be due to the fact that 
hits for a particular grouping of Web pages beginning with advertising banners are shown on Web pages in these 
the same initial sequences of characters. For example, the directories and the advertisers may want feedback on how 
URLs for the Web pages of Company A might all begin with many Web viewers are seeing their ads. Alternatively, the 
"http://www.company^A.com/." Additionally, the Web 15 company may have its own business reasons for analyzing 
pages associated with products produced by Company A statistics in certain areas of their Web site. Regardless, it is 
might all begin with the sequence "http://www.company_ apparent that simply tracking all hits for a root directory 615 
A.com/product/." Furthermore, pages related to a particular on the company's server is insufficient. For example, hits 
product, might all begin with the URL prefix "http:// would be tracked for directories in which the remote site 
www.company_A.com/product/<product_name>/" 20 administrator had no interest. A list of URL patterns is used 
where<product_name> identifies the particular product. To to accommodate the flexibility desired. The following URL 
track the hits for pages relating to Company A's Gizmo patterns may be stored in the site tracking list 425 for 
product line, therefore, the following URL pattern may be CompanyA to track the above-mentioned subdirectories: 
usec * : "http:/Avww.companyA.com/products/Events/" and "http:// 
"http://www.company_A.eom/product/Gizmo//' Similarly, 25 www.companyA.com/products/Ads/." The list of URL pat- 
to track the hits for all of Company A's products the terns 510 in each site tracking list record 505 allows a 
following URL pattern may be used: "http:// remote site to enumerate specific directories, for example, in 
www.company_A.com/product/." which they would like to track user hits. 

URL patterns are not limited to prefixes, other forms of 30 The advantages of providing forms of URL patterns with 

URL patterns may be used such as patterns including wild wild cards becomes apparent with reference to the directory 

card or other special characters, or patterns in the form of structure 600. Assume the character is a wild card. That 

standard regular expressions. is, it matches zero or more characters. Since, CompanyA has 

An Exemplary Per Site Hit Database lwo subdirectories with press releases, a convenient way to 

, . t_ • i i 35 track hits in both is with the following URL pattern: "http:// 

FIG. 5B illustrates an exemplary per site hit database www.companyA.com/^press.releases//' Without the use of 

according to one embodiment of the present invention. a wild c ^ ^ ivaknt URL s are ^ foUows; 

5f? <Vn ?! infonnatlo 1 n P r0Vlded "\ th * Slt * track ! n £ l f "http://www.companyA.com/press_releases/' and "http:// 

425 of FIG. 5A, an exemplary per site hit database might be www . company A. C om/products/pres S _relea 5 es.» Thus, it 

represented as per site hit database 440. In this example the 40 should be apprecialed that wild cards and lar . 

per site hit database 440 includes three sit hit records 515 ^ ^ MM effid afld mm ^ m in the 

corresponding to remote sites for CompanyA, CompanyB specificatioD of URL oatteras . 
and CompanyC. 

In this embodiment, each site hit record 515 includes a Hit Accumulation 

timestamp 525. The timestamp 525 may indicate the time 45 FIG. 7 is a flow diagram illustrating a method of per- 

from which the hits have been accumulated. In this example, forming hit accumulation according to one embodiment of 

therefore, there have been six hits to the monitored URLs the present invention. In this embodiment, each site hit 

since Jan. 16, 1997 at 10:01:58. Those of skill in the art will record 515 begins in an initial state having an indication of 

appreciate the timestamp 525 may represent the period of the remote site (e.g., the name 530) and a timestamp 525 

accumulation in other ways such as elapsed time since the 50 representing the time at which hit accumulation began, 

last hit report was generated. Initially, the hit accumulation server 415 waits for an indi- 

Site hit records 515 also include a remote site name 530. cation that a client request has been serviced from the 

The remote site names 530 from front to back correspond to proxy's local cache (step 710). For example, the hit accu- 

CompanyA, CompanyB, and CompanyC, Site hit record 515 mulator processor 430 may determine that a new entry has 

further includes a list of hits 520. In this embodiment, the list 55 been made to the hit log 420. 

of hits 520 includes the URLs of the documents that were Upon receiving an indication that the proxy 400 has 

requested and subsequently serviced from the proxy's local served up a cached response, the hit accumulation server 415 

cache (e.g., document cache 465) since the time indicated by determines if the URL of the document retrieved from the 

the timestamp 525. According to the site hit record 515 for proxy's local cache is one whose hits are to be tracked. As 

CompanyA, the adl.html Web page has been requested and 60 discussed above, not all hits are tracked. In this embodiment, 

serviced from the proxy's local cache three times. Similarly, hits are tracked only for documents matching URL patterns 

the sales.html and Ql.html Web pages have been provided that have been registered in a tracking list such as the site 

from the proxy's cache once and twice, respectively. Based tracking list 425, discussed above. Therefore, the hit accu- 

upon the accumulated hit information in a particular site hit mulator processor 430 compares the URL of the retrieved 

record 515, a detailed hit report may be provided to the 65 document to URL patterns 510 in each site tracking list 

corresponding remote site administrator. Hit accumulation record 505 to determine if the hit will be recorded in the per 

will be discussed further below. site hit database 440 (step 720). If no URL patterns 510 
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match the retrieved document the hit is ignored. Otherwise, 
if the retrieved document matches any of the URL patterns 
510, then the appropriate site hit record 515 in the per site 
hit database 440 is updated (step 730). 

Update of the site hit record 515 can be explained briefly 5 
with respect to FIG. 5B. In this embodiment, the appropriate 
site hit record 515 is searched for an entry that matches the 
URL of the retrieved document. If the retrieved document's 
URL does not already exist in the list of hits 520 for the site 
hit record 515, then the URL is added and its count is set to 10 
one since this is the document's first hit. However, if the 
retrieved document's URL was already in the list of hits 520 
(meaning it has had at least one previous hit), then only the 
corresponding count needs to be incremented. In this 
manner, each document retrieved from the proxy's local 15 
cache that matches a tracked URL pattern will have an entry 
in the list of hits 520 with a corresponding count indicating 
the number of cache hits. 

Hit Reporting 20 

Referring now to FIG. 8, a method of hit reporting 
according to one embodiment of the present invention is 
illustrated. In this embodiment of the present invention, the 
hit accumulator server 415, in addition to its other 
responsibilities, acts as a Web server providing an HTTP 25 
interface by which remote site administrators can access the 
accumulated hits for their respective tracked sites. The hit 
report processor 450 waits until a request is received from a 
remote site administrator (step 810). Preferably, the HTTP 
address on the hit accumulation server 415 can be used to 30 
identify the requester of the information. For example, the 
hit report for Company A, might be accessed on the hit 
accumulation server 415 at: "http://www.webtv.net/hits/ 
company_a." ^ 

To limit access to the hit reports a secure communication 
technology such as Secure Sockets Layer (SSL) or other 
available secure communication protocol can be used to 
keep the hit information private by providing encrypted 
communications across the network. Additionally, the report 4Q 
requests can be authenticated to assure only a particular 
remote server or individual can access the information (step 
820). 

Once a request has been received from a remote site 
administrator and it has been optionally authenticated, then 45 
a report can be generated from the hit data accumulated such 
as the list of hits 520 for the particular sit hit record 515 (step 
830). In this embodiment, the report may include a list of 
URLs and their corresponding counts since the last report. 
For convenient access via the Web, the report may be 50 
formatted in an HTML format. Also, for the convenience of 
the remote site administrator a timestamp that identifies the 
starting point of the accumulation may be included in the 
report. The level of specificity of the URL list may be at the 
document level thereby allowing the remote site adminis- 5S 
trator to determine the number of hits for individual 
documents, however, it may also be helpful to additionally 
summarize the hits by directory, for example. It will be 
recognized that numerous other ways of formatting and 
arranging the hit reports are possible. 60 

After the report has been formatted, the response contain- 
ing the report is transmitted to the remote site administrator 
(step 840). 

In this embodiment, before resuming the hit accumulation 
of FIG. 7, the accumulated data in the site hit record 515 is 65 
cleared (step 850) also the timestamp 525 is reset to reflect 
the current time. The above steps for retrieving a report from 
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the proxy may be periodically repeated at the convenience of 
the remote site administrator whenever an accurate total hit 
count is desired. 

In alternative embodiments, hit reports may be provided 
to remote sites in a number of other ways. Hit reports need 
not be initiated by a request from the remote site adminis- 
trator. For example, the proxy may periodically send unso- 
licited hit reports via e-mail, the proxy may periodically 
download hit updates to a device specified by the remote site 
administrator, or the hit reports might be transmitted to 
remote site administrators in the form of datagrams. In any 
event, the assignees of the present invention appreciate a 
variety of reporting mechanisms are possible. 

Allocation of Cache Space Within a Proxy 

FIG. 9 is a data flow diagram illustrating the interaction 
of proxy components according to another embodiment of 
the present invention. In this embodiment, proxy 900 
includes a plurality of proxy servers 405 communicatively 
coupled to a dispatcher 910 and a hit accumulator server 
415. Rather than allowing a given proxy server's cached 
contents to be determined based upon the requests of an 
associated client, the content of the Web can be distributed 
among proxy servers 405 by a hash algorithm executed by 
the dispatcher 910. The hash algorithm preferably maps a 
given URL to one and only one of the plurality of proxy 
servers 405, This can be accomplished using a portion of the 
output of a secure hash algorithm such as the Message 
Digest 5 (MD5) hash algorithm. The hash algorithm can be 
thought of as a mechanism for assigning a range of URLs to 
each of the proxy servers 405 in the proxy 900. 

In this embodiment, the dispatcher 910 receives document 
requests including URLs from a client such as WebTV™ 
client 1. Based upon the URL in the request, the dispatcher 
determines the proxy server 405 in which the document 
should be cached and forwards the client request to that 
proxy server 405. If the document requested by the client is 
not found in the proxy server's local document cache 465, 
then the proxy server 405 requests the document from the 
appropriate server (e.g., a remote server) and caches the 
document when it is received from the server. 

If redundancy is desired, the hashed result of a URL may 
be used to identify a cluster of two or more proxy servers 
rather than a single proxy server 405. In this manner, the 
load required to support a popular document can be shared 
among a group of proxy servers. 

In an alternative embodiment, a decentralized dispatching 
scheme can be implemented. For example, the proxy servers 
405 may be arranged to form an interconnected ring con- 
figuration and the functionality of the dispatcher 910 may be 
incorporated into each proxy server 405. In this 
embodiment, the client document requests may be initially 
handled by one of the proxy servers 405 in the ring. If the 
requested document is not found in the local cache of the 
initial proxy server, the initial proxy server may forward the 
request to the appropriate proxy server based on the hashing 
scheme discussed above. 

FIG. 10 is a flow diagram illustrating a method of 
dispatching requests to segregate the storage of documents 
according to one embodiment of the present invention. 
While both a centralized and a decentralized request dis- 
patching mechanism have been discussed above, the method 
described below is generally applicable to both. In this 
embodiment, initially, a document request is received from 
a client (step 1010). 

If a centralized dispatcher such as dispatcher 910 receives 
the request, then based upon the URL an appropriate proxy 
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server is determined based upon the output of the hash 
algorithm (step 1020). 

However, in a decentralized dispatching environment, the 
initial proxy server receiving the client request may assume 
it is the appropriate proxy server and first check its local $ 
document cache 465. If the document is not present, then 
proxy server may perform the hash algorithm on the URL to 
determine which of the remaining proxy servers is appro- 
priate for the request (step 1020). 

After determining the proxy server appropriate for the 10 
client request, the request is forwarded to that proxy server 
(step 1030). The proxy server 405 attempt to service the 
request from its local document cache 465. If a cache hit 
occurs, then the document is immediately available from the 
proxy server's local document cache 465. However, if a 15 
cache miss occurs, the proxy server 405 will retrieve the 
document from an appropriate server and store a copy 
locally. In any event, the centralized or decentralized dis- 
patching mechanism ultimately receives a response from the 
server (e.g., the document requested by the client) (step 20 
1040). Finally, the response, typically in the form of an 
HTML document is forwarded to the client (step 1050). This 
method of caching documents segregates the content of the 
Web based upon the URL of the documents. Since each URL 
will map to only one proxy server 405, advantageously this 25 
approach more efficiently allocates the proxy's cache space 
by avoiding unnecessary redundancy. 

In the foregoing specification, the invention has been 
described with reference to specific embodiments thereof. It ^ 
will, however, be evident that various modifications and 
changes may be made thereto without departing from the 
broader spirit and scope of the invention. The specification 
and drawings are, accordingly, to be regarded in an illus- 
trative rather than a restrictive sense. ^ 

What is claimed is: 

1. A method of tracking hits in a proxy, the proxy 
including a document cache having stored therein recently 
requested documents, the proxy coupled to a client and to a 
remote server, the method comprising the steps of: 

40 

the proxy maintaining information regarding client 

requests that are serviced from the document cache; 
the proxy receiving a client request for a document; 
the proxy determining whether to service the client 

request from its document cache or whether to forward 45 

the client request to another server; 
updating a count if the client request is serviced from the 

document cache; and 
the proxy providing the information to a remote site 

administrator. 50 

2. The method of claim 1, wherein the information 
includes a count representing the number of times client 
requests for a particular document have been serviced from 
the document cache. 

3. Hie method of claim 2, wherein the information 55 
includes a timestamp identifying a time period to which the 
count corresponds. 

4. The method of claim 1, wherein the remote site 
administrator requests the information from the proxy 
thereby initiating the step of providing the information to a 60 
remote site administrator. 

5. The method of claim 4, wherein the information 
provided to the remote site administrator is in the form of an 
Hypertext Mark-up Language (HTML) report. 

6. The method of claim 4, wherein the proxy authenticates 65 
the remote site administrator's request prior to the step of 
providing the information to a remote site administrator. 



12 

7. The method of claim 1, wherein the step of providing 
the information to a remote site administrator further 
includes the step of the proxy transmitting unsolicited infor- 
mation from the proxy to the remote site administrator. 

8. In a system having a hit accumulation server and one 
or more proxy servers, each of said one or more proxy 
servers including a local cache having stored therein one or 
more cached documents, the proxy coupled to a client and 
to a remote server, a method of tracking requests for 
documents stored in a proxy comprising the steps of: 

a proxy server receiving a client request for a document; 

the proxy recording a hit for the document if the document 
is available from the proxy server's local cache, the 
proxy server notifying the hit accumulation server that 
the client request was serviced from the local cache, 
and the hit accumulation server recording the hit and a 
path of the document to a table if the document 
corresponds to one of a set of monitored Uniform 
Resource Locator (URL) patterns; and 

providing an indication of the number of hits for the 
documents to a remote site administrator. 

9. The method of claim 8, wherein the step of the proxy 
server notifying the accumulation server further includes the 
steps of: 

the accumulation server monitoring a common storage 
device, the common storage device accessible to the 
one or more proxy servers; and 

the proxy server logging an entry to the common storage 
device if the proxy server services the client request 
from its local cache, the entry including a URL for the 
document requested by the client. 

10. The method of claim 9, wherein the set of monitored 
URL patterns represents one or more directories on a remote 
site for which document hits are to be tracked, the method 
further including the steps of: 

the accumulation server detecting the entry; and 

the accumulation server comparing the URL in the entry 

to the set of monitored URL patterns to determine 

whether or not to record the hit. 

11. A method of tracking hits by a proxy, the proxy 
including one or more proxy servers and a hit accumulation 
server, each of said one or more proxy servers including a 
local cache having stored therein recently requested 
documents, the proxy coupled to a client and to a remote 
server, the method comprising the steps of: 

a proxy server receiving a client request for a document; 
the proxy server determining whether to service the client 

request from its local cache or whether to forward the 

client request to another server; 
inserting a new log entry onto a common storage if the 

proxy server services the client request from its local 

cache, the new log entry including a location of the 

document; 

the accumulation server detecting the new log entry by 
monitoring the common storage; 

the accumulation server comparing the location of the 
document to a predetermined set of directories to 
determine whether or not to record the hit; 

recording the hit if the location of the document matches 
a directory in the predetermined set of directories; and 

providing the number of hits for a set of documents 
located in a first subset of directories of the predeter- 
mined set of directories to a remote site administrator. 

12. The method of claim 11, wherein the common storage 
comprises a hit log. 
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13. The method of claim 11, wherein the location of the 
document comprises a Uniform Resource Locator (URL). 

14. A machine-readable medium having stored thereon 
data representing sequences of instructions, said sequences 
of instructions which, when executed by a processor, cause 
said processor to perform the steps of: 

maintaining information regarding client requests that are 
serviced from a document cache of a proxy server; 

receiving a client request for a document; 

determining whether to service the client request from the 
document cache or whether to forward the client 
request to another server, 
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updating a count if the client request is serviced from the 

document cache; and 
providing the information to a remote site administrator. 

15. The machine-readable medium of claim 14, wherein 
the information includes a count representing the number of 
times client requests for a particular document have been 
serviced from the document cache. 

16. The machine-readable medium of claim 15, wherein 
3 the information includes a timestamp identifying a time 

period to which the count corresponds. 

* * * + * 
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CoL 6, line 52, after "such as" insert -a- 

CoL 6, line 64, after "tracking list" change "435" to -425- (See figure 4) 
Col, 7, line 41, after "three" change "sit" to -site- 
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